We start with Confidence Intervals in a simple Gaussian setting. We have \(X_1, \ldots, X_n \sim_{i.i.d.} \mathcal{N}(\mu, \sigma^2)\) where \(\mu\) and \(\sigma\) are unknown (to be estimated and/or tested).
The maximum likelihood estimator for \((\mu, \sigma^2)\) is \((\overline{X}_n, \widehat{\sigma}^2)\) where
\[\overline{X}_n =\sum_{i=1}^n \frac{1}{n} X_i\quad\text{and}\quad \widehat{\sigma}^2=\frac{1}{n}\sum_{i=1}^n (X_i - \overline{X}_n)^2\] By Student’s Theorem \(\overline{X}_n\) and \(\widehat{\sigma}^2\) are stochastically independent \(\overline{X}_n \sim \mathcal{N}(\mu, \widehat{\sigma}^2/n)\) and \(n \widehat{\sigma}^2/\sigma^2 \sim \chi^2_{n-1}\).
Simulate \(N=1000\) Gaussian samples of size \(n=100\).
Compute the empirical coverage of confidence intervals for \(\alpha=5\%\) and \(\alpha=10\%\).
Plot a histogram for replicates of \(\frac{\overline{X}_n - \mu}{\widehat{\sigma}\sqrt{n}}\). Overlay the density of \(t_{n-1}\).
Testing independence
In data gathered from the 2000 General Social Survey (GSS), one cross classifies gender and political party identification. Respondents indicated whether they identified more strongly with the Democratic or Republican party or as Independents. This is summarized in the next contingency table (taken from Agresti Introduction to Categorical Data Analysis).
Turn the 3-way contingency table into a dataframe/tibble with columns Gender, Dept, Admit, n, where the first columns are categorical, and the last column counts the number of co-occurrences of the values in the first three columns amongst the UCB applicants.
Question
Make it a bivariate sample by focusing on Gender and Admit: compute the margin table
Draw the corresponding mosaicplot and compute the chi-square independence statistic.
Comment.
Question
Visualize the three-way contingency table using double-decker plots from vcd
Question
Question
Viewing the UCBAdmissions dataset, which variable would you call a response variable? Which variable would you call covariates?
Test independence between Gender and Dept.
Question
For each department of application (Dept), extract the partial two-way table for Gender and Admit. Test each two-way table for independence. How many departments pass the test at significance level \(1\%\), \(5\%\)?
Note that the two-way cross-sectional slices of the three-way table are called partial tables.
What we observed has a name.
Simpson’s paradox
The result that a marginal association can have different direction from the conditional associations is called Simpson’s paradox. This result applies to quantitative as well as categorical variables.